University of Konstanz

UKN

VAST 2009 Challenge
Challenge 3 - Video Analysis

Authors and Affiliations:

Dr. Peter Bak, University of Konstanz, bak@dbvis.inf.uni-konstanz.de [PRIMARY contact]

Patrick Jungk, University of Konstanz, patrick.jungk@uni-konstanz.de [lead development, analyst]

Tool(s):

VAT – Video Analysing Tool

Developed at: University of Konstanz

by: Patrick Jungk

Version 1.0

Tool for



KNIME – Konstanz Information Miner

Developed at: University of Konstanz

by: KNIME CORE TEAM

Version 2.0.3

KNIME, pronounced [naim], is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively execute some or all analysis steps, and later investigate the results through interactive views on data and models. […]

www.knime.org

Video:

VAST.wmv

 

 

ANSWERS:

Short Answer


Figure 1: Analysis process of video data




In order to sufficiently reduce the data for a successful information extraction the following basic steps are indispensable:

A base concept aims at determining events by detecting moving objects, surrounding them with bounding boxes for classifying them. Afterwards suspicious

behavioural patterns can be automatically detected and manually verified.




Figure 1: Detection of possible events requires data reduction (part below shows the determined patterns)



The retrieved data can be evaluated quickly by visualizing the relevant

patterns. The initial data can be reduced to less than 1%. Resulting in decreasing the expenditure of human labour regarding

long lasting videos with more than 1h.



MC3.1: Provide a tab-delimitated table containing the location, start time and duration of the events identified above. Please name the file Video.txt and place it in the same directory as your index.htm file.  Please see the format required in the Task Descriptions.

Video.txt


MC3.2:  Identify any events of potential counterintelligence/espionage interest in the video.  Provide a Detailed Answer, including a description of any activities, and why the event is of interest. 

Table of content:

1 Preliminary Considerations

2 Analysis

.2.3 Determination of bounding boxes

.2.4 Classification of Bounding Boxes

3 Determination of Suspicious Events

4 Result

.4.1 Suspicious Events

.4.2 Performance Comparison of Automatic and Interactive Parts

.4.3 Data Reduction

.4.4 Conclusion



List of figures

Figure 1: Classification needs interactive user involvement

Figure 2: KDD pipeline (Dr. Keim, 2006, presentation Infovis summer term 2006, University of Konstanz)

Figure 3: Analysis process of video data

Figure 4: Classification needs user interaction and computing using prediction algorithms

Figure 5: patterns can be verified manually and marked to be exported to a result table



List of Tables

Table 1: Mapped colours to the classified bounding boxes for visualisation

Table 2: comparison of user and hardware process times for video 1

Table 3: Data reduction of video one leads to relevant events



1 Preliminary Considerations

To identify any events of potential counter intelligence/espionage interest a definition of such an suspicious event needs to be given. Following events were defined as suspicious:

These events need to be describe in a formal way with behavioural pattern. To recognize such an event following dimensions are to be considered as well:

Suspicious item are as followed:

Suspicious areas are as followed:

Those areas specify the areas of interest. In order to determine events, items within an area have to be recognized. Therefore, areas of movement within the video need to be recognized since every moving object may be indicating suspicious deeds. Those areas of movement are to be marked and classified, see figure 1.


Figure 1: Classification needs interactive user involvement




Following types are considered as potential suspicious and need to be determined.

All other moving areas are not to be considered as suspicious. This means those areas are irrelevant and can be excluded.

2 Analysis

To analyse the video data, an interactive process based on the KDD (Knowledge Discovery in Databases) - pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining. (1996), 1-34.) was used as shown in figure 4.


Figure 2: KDD pipeline (Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining. (1996), 1-34., http://www.aaai.org/aitopics/assets/PDF/AIMag17-03-2-article.pdf)




Following this terminology, a flow chart was created describing the operative steps required to conduct a successful analysis of video stream data.


Figure 3: Analysis process of video data



In order to extract the data from the video, thresholds have to be set manually by the user. The most important thresholds are as follows.



2.1 Determination of bounding boxes

As the result of the determination chain the bounding boxes inside a frame are determined. Figure 1 shows the result of the automatic determination of bounding boxes. The colour bar below the frame preview shows the count of bounding boxes over the time, each line stands for one location, starting with the first one. The lighter the colour is, the more bounding boxes were found in this time slot. Each movement of the camera position indicates a change of location. This leads to the location information relevant for the result.



2.2 Classification of Bounding Boxes

The process of classification is an interactive process divided into two sub processes.

  1. manual classification of a subset of bounding boxes (training)

  2. automatic classification of the remaining bounding boxes using Neural Networks Algorithm (Multi Layer Perception Predictor) or Decision Tree Predictor

No.

name

color

R

G

B

1

human

green

77

157

74

2

Two humans

orange

255

127

0

3

car

red

228

26

28

4

Two cars

blue

126

126

184

Table 1: Mapped colours to the classified bounding boxes for visualisation



This trained data will be used for the next step (Multi Layer Perception Predictor, Decision Tree Predictor) as picture 9 shows. At the end of this step a table containing all bounding boxes of one sub-video results.


Figure 4: Classification needs user interaction and computing using prediction algorithms






3 Determination of Suspicious Events

Once the patterns have been recognized, the suspicious events can be reviewed by visualisation of the patterns.


Figure 5: patterns can be verified manually and marked to be exported to a result table




4 Result

4.1 Suspicious Events

As a result the most relevant pattern is

This means also, that two persons may walk down a street together (implicates previous meeting).

The pattern:

needs to be redefined for another run since too many events were found.



4.2 Performance Comparison of Automatic and Interactive Parts

Performance is assessed for the interactive and automatic parts of the process chain. Process times for the user as well as for the hardware (server,pc) are listed separately in the table 2.

No.

Process step

time in Min (user)

time in Min (HW)

1

Frame Extraction

0

180 - 360

2

Set up Thresholds

5 - 15

5 - 15

3

Determination of bounding boxes

0

180 - 240

4

Classifying of a subset of bounding boxes

5 - 15

5 - 15

5

Filtering Boxes 10

<1

<1

6

Visualisation

0

<1

7

Pattern Recognition

0

<1

8

Pattern Recognition Review

5 - 30

0

Table 2: comparison of user and hardware process times for video 1



4.3 Data Reduction

Table 3 shows the reduction of data for video 1. The

final relevant data was reduced to 0,008 % of the potential relevant data.

No.

Process step

count of table rows input

count of table rows output

1

Frame Extraction

0

0

2

Set up Thresholds

0

0

3

Determination of bounding boxes

0

143528

4

Classifying of a subset of bounding boxes

143528

143528

5

Filtering bounding boxes

143528

93865

6

Visualisation

93865

93865

7

Pattern Recognition

93865

3859

8

Pattern Recognition Review

3859

12

Table 3: Data reduction of video one leads to relevant events



4.4 Conclusion

Compared to the complete video time (4h) the user interaction takes between 25 to 70 minutes. The VAT-Tool enables an analyst to focus her/his attention on a limited amount of automatically preselected events, while it would otherwise be very difficult and exhausting to attentively watch the whole videos with several hours of duration.